منابع مشابه
Ambiguity-awaredocument Similarity
In recent years, great advances have been made in the speed, accuracy, and coverage of automatic word sense disambiguator systems that, given a word appearing in a certain context, can identify the sense of that word. In this paper we consider the problem of deciding whether same words contained in different documents are related to the same meaning or are homonyms. Our goal is to improve the e...
متن کاملCrowdsourcing Ambiguity-Aware Ground Truth
The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, this assumption often creates is...
متن کاملDocument Similarity Judgment for Interactive Document Clustering
This paper investigates the task of document similarity judgment for interactive document clustering. We suppose one of the promising approaches for developing next generation of web search engines is to incorporate user feedback mechanism into constrained clustering. As a basis for designing such search engines, it is important to study the interface design that can reduce user' burden of givi...
متن کاملContext Aware Document Embedding
Recently, doc2vec has achieved excellent results in different tasks (Lau and Baldwin, 2016). In this paper, we present a context aware variant of doc2vec. We introduce a novel weight estimating mechanism that generates weights for each word occurrence according to its contribution in the context, using deep neural networks. Our context aware model can achieve similar results compared to doc2vec...
متن کاملStacked Similarity-Aware Autoencoders
As one of the most popular unsupervised learning approaches, the autoencoder aims at transforming the inputs to the outputs with the least discrepancy. The conventional autoencoder and most of its variants only consider the one-to-one reconstruction, which ignores the intrinsic structure of the data and may lead to overfitting. In order to preserve the latent geometric information in the data, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Natural Language Computing
سال: 2013
ISSN: 2319-4111,2278-1307
DOI: 10.5121/ijnlc.2013.2302